level ii
Evaluating Large Language Models for Financial Reasoning: A CFA-Based Benchmark Study
Yao, Xuan, Wang, Qianteng, Liu, Xinbo, Huang, Ke-Wei
The rapid advancement of large language models presents significant opportunities for financial applications, yet systematic evaluation in specialized financial contexts remains limited. This study presents the first comprehensive evaluation of state-of-the-art LLMs using 1,560 multiple-choice questions from official mock exams across Levels I-III of CFA, most rigorous professional certifications globally that mirror real-world financial analysis complexity. We compare models distinguished by core design priorities: multi-modal and computationally powerful, reasoning-specialized and highly accurate, and lightweight efficiency-optimized. We assess models under zero-shot prompting and through a novel Retrieval-Augmented Generation pipeline that integrates official CFA curriculum content. The RAG system achieves precise domain-specific knowledge retrieval through hierarchical knowledge organization and structured query generation, significantly enhancing reasoning accuracy in professional financial certification evaluation. Results reveal that reasoning-oriented models consistently outperform others in zero-shot settings, while the RAG pipeline provides substantial improvements particularly for complex scenarios. Comprehensive error analysis identifies knowledge gaps as the primary failure mode, with minimal impact from text readability. These findings provide actionable insights for LLM deployment in finance, offering practitioners evidence-based guidance for model selection and cost-performance optimization.
- Education (1.00)
- Banking & Finance > Trading (1.00)
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams
Callanan, Ethan, Mbakwe, Amarachi, Papadimitriou, Antony, Pei, Yulong, Sibue, Mathieu, Zhu, Xiaodan, Ma, Zhiqiang, Liu, Xiaomo, Shah, Sameena
Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios. We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams. Finally, we outline insights into potential strategies and improvements to enhance the applicability of LLMs in finance. In this perspective, we hope this work paves the way for future studies to continue enhancing LLMs for financial reasoning through rigorous evaluation.
Artificial intelligence algorithms appear to be better at detecting skin cancer
Researchers have shown for the first time that a form of artificial intelligence or machine learning known as a deep learning convolutional neural network (CNN) is better than experienced dermatologists at detecting skin cancer. In a study published in the leading cancer journal Annals of Oncology today (Tuesday), researchers in Germany, the USA and France trained a CNN to identify skin cancer by showing it more than 100,000 images of malignant melanomas (the most lethal form of skin cancer), as well as benign moles (or nevi). They compared its performance with that of 58 international dermatologists and found that the CNN missed fewer melanomas and misdiagnosed benign moles less often as malignant than the group of dermatologists. A CNN is an artificial neural network inspired by the biological processes at work when nerve cells (neurons) in the brain are connected to each other and respond to what the eye sees. The CNN is capable of learning fast from images that it "sees" and teaching itself from what it has learned to improve its performance (a process known as machine learning).
- North America > United States (0.25)
- Europe > France (0.25)
- Oceania > Australia > Victoria > Melbourne (0.05)
- (2 more...)
- Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (1.00)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)
Artificial Intelligence can detect skin cancer better than dermatologists
An artificial intelligence system can better detect skin cancer than experienced dermatologists, a study has found. Researchers trained a form of artificial intelligence or machine learning known as a deep learning convolutional neural network (CNN) to identify skin cancer by showing it more than 100,000 images of malignant melanomas (the most lethal form of skin cancer), as well as benign moles (or nevi). They compared its performance with that of 58 international dermatologists and found that the CNN missed fewer melanomas and misdiagnosed benign moles less often as malignant than the group of dermatologists. "The CNN works like the brain of a child. To train it, we showed the CNN more than 100,000 images of malignant and benign skin cancers and moles and indicated the diagnosis for each image," said Holger Haenssle, from the University of Heidelberg in Germany.
- Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (1.00)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)
AI can detect skin cancer better than doctors now
BERLIN: An artificial intelligence system can better detect skin cancer than experienced dermatologists, a study has found. Researchers trained a form of artificial intelligence or machine learning known as a deep learning convolutional neural network (CNN) to identify skin cancer by showing it more than 100,000 images of malignant melanomas (the most lethal form of skin cancer), as well as benign moles (or nevi). They compared its performance with that of 58 international dermatologists and found that the CNN missed fewer melanomas and misdiagnosed benign moles less often as malignant than the group of dermatologists. "The CNN works like the brain of a child. To train it, we showed the CNN more than 100,000 images of malignant and benign skin cancers and moles and indicated the diagnosis for each image," said Holger Haenssle, from the University of Heidelberg in Germany.
- Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (1.00)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)